Austrian Hotels Dataset

Overview

This dataset contains realistic simulated data on hotels across Austria, designed for practicing data wrangling and table joins. The dataset consists of multiple related tables that can be combined using various join operations.

Used in: Week 4 (Joining Tables)

Generated by: Claude AI (Sonnet 3.7) with realistic relationships between variables

Dataset Structure

The dataset includes 8 related tables with hotels across Austrian cities, covering occupancy, pricing, tourism statistics, and economic indicators.

Core Tables

File Description Rows Key Columns
hotels.csv Basic hotel information 200 hotel_id (PK)
cities.csv City information 10 city (PK)
monthly_occupancy.csv Monthly hotel performance metrics ~3,800 hotel_id, month, year
city_tourism.csv Monthly tourism statistics by city 240 city, month, year
economic_indicators.csv Monthly economic indicators 24 month, year
reviews.csv Hotel guest reviews ~1,700 review_id (PK), hotel_id (FK)
amenities.csv List of possible hotel amenities 10 amenity_id (PK)
hotel_amenities.csv Hotel-amenity relationships ~1,000 hotel_id, amenity_id

Key Relationships

  • One-to-One: Hotels ↔︎ Cities (through city name)
  • One-to-Many: Hotels → Monthly Occupancy, Hotels → Reviews
  • Many-to-Many: Hotels ↔︎ Amenities (through hotel_amenities)
  • Composite Keys: Monthly data requires (hotel_id, month, year) or (city, month, year)

Documentation

  • hotel-data-readme.md - Detailed schema documentation with column descriptions and data types

Learning Objectives

This dataset allows students to practice: - Inner, left, right, and full joins - One-to-one and one-to-many relationships - Composite key joins - Data aggregation after joins - Handling missing values in joins

Sample Research Questions

  • How do hotel prices vary by city and season?
  • What’s the relationship between amenities and guest ratings?
  • How do economic indicators affect hotel occupancy rates?
  • Which cities have the highest tourism-to-hotel capacity ratios?